A Framework to Adjust Dependency Measure Estimates for Chance

نویسندگان

  • Simone Romano
  • Nguyen Xuan Vinh
  • James Bailey
  • Karin M. Verspoor
چکیده

Estimating the strength of dependency between two variables is fundamental for exploratory analysis and many other applications in data mining. For example: non-linear dependencies between two continuous variables can be explored with the Maximal Information Coefficient (MIC); and categorical variables that are dependent to the target class are selected using Gini gain in random forests. Nonetheless, because dependency measures are estimated on finite samples, the interpretability of their quantification and the accuracy when ranking dependencies become challenging. Dependency estimates are not equal to 0 when variables are independent, cannot be compared if computed on different sample size, and they are inflated by chance on variables with more categories. In this paper, we propose a framework to adjust dependency measure estimates on finite samples. Our adjustments, which are simple and applicable to any dependency measure, are helpful in improving interpretability when quantifying dependency and in improving accuracy on the task of ranking dependencies. In particular, we demonstrate that our approach enhances the interpretability of MIC when used as a proxy for the amount of noise between variables, and to gain accuracy when ranking variables during the splitting procedure in random forests.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and adjustment of dependency measures

Dependency measures are fundamental for a number of important applications in data mining and machine learning. They are ubiquitously used: for feature selection, for clustering comparisons and validation, as splitting criteria in random forest, and to infer biological networks, to list a few. More generally, there are three important applications of dependency measures: detection, quantificati...

متن کامل

The meaning of kappa: probabilistic concepts of reliability and validity revisited.

A framework--the "agreement concept"--is developed to study the use of Cohen's kappa as well as alternative measures of chance-corrected agreement in a unified manner. Focusing on intrarater consistency it is demonstrated that for 2 x 2 tables an adequate choice between different measures of chance-corrected agreement can be made only if the characteristics of the observational setting are take...

متن کامل

ساخت و اعتباریابی مقیاس دانش و نگرش جنسی

Abstract The main purpose of this study was to present an account of the development and examine psychometric properties of Sexual Knowledg and Attitude Scale (SKAS) including construct validity, convergent and discriminant validity, internal consistency, and test-retest reliability. Eight hundred and thirty seven Iranian men and women (385 men, 451 women) participated in this study, voluntari...

متن کامل

Data envelopment analysis in service quality evaluation: an empirical study

Service quality is often conceptualized as the comparison between service expectations and the actual performance perceptions. It enhances customer satisfaction, decreases customer defection, and promotes customer loyalty. Substantial literature has examined the concept of service quality, its dimensions, and measurement methods. We introduce the perceived service quality index (PSQI) as a sing...

متن کامل

Empirical estimates for various correlations in longitudinal-dynamic heteroscedastic hierarchical normal models

In this paper, we first define longitudinal-dynamic heteroscedastic hierarchical  normal  models. These models can be used to fit longitudinal data in which the dependency structure is constructed through a dynamic model rather than observations. We discuss different methods for estimating the hyper-parameters. Then the corresponding estimates for the hyper-parameter that causes the association...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016